Exploratory Subgroup Identification with ForestSearch

German Breast Cancer Study Group (GBSG) Analysis

Author

Larry F. León

Published

February 9, 2026

1 Introduction

This vignette demonstrates the ForestSearch methodology for exploratory subgroup identification in survival analysis, as described in León et al. (2024) Statistics in Medicine.

1.1 Motivation

In clinical trials, particularly oncology, subgroup analyses are essential for:

  • Evaluating treatment effect consistency across patient populations
  • Identifying subgroups where treatment may be detrimental (harm)
  • Characterizing subgroups with enhanced benefit
  • Informing regulatory decisions and clinical practice

While prespecified subgroups provide stronger evidence, important subgroups based on patient characteristics may not be anticipated. ForestSearch provides a principled approach to exploratory subgroup identification with proper statistical inference.

1.2 Methodology Overview

ForestSearch identifies subgroups through:

  1. Candidate factor selection: Using LASSO and/or Generalized Random Forests (GRF)
  2. Exhaustive subgroup search: Evaluating all combinations up to maxk factors
  3. Consistency-based selection: Applying splitting consistency criteria
  4. Bootstrap bias correction: Adjusting for selection-induced optimism
  5. Cross-validation: Assessing algorithm stability

The key innovation is the splitting consistency criterion: a subgroup is considered “consistent with harm” if, when randomly split 50/50 many times, both halves consistently show hazard ratios ≥ 1.0 (for example if 1.0 represents a meaningful “harm threshold”).

2 Setup

2.1 Load Required Packages

Show code
library(forestsearch)
library(survival)
library(data.table)
library(ggplot2)
library(gt)
library(grf)
library(policytree)
library(doFuture)
library(doRNG)

# Optional packages for enhanced output
library(patchwork)
library(weightedsurv)

# Set ggplot theme
theme_set(theme_minimal(base_size = 12))

3 Data: German Breast Cancer Study Group Trial

3.1 Study Background

The GBSG trial evaluated hormonal treatment (tamoxifen) versus chemotherapy in node-positive breast cancer patients. Key characteristics:

  • Sample size: N = 686
  • Outcome: Recurrence-free survival time
  • Censoring rate: ~56%
  • Treatment: Hormonal therapy (tamoxifen) vs. chemotherapy

3.2 Data Preparation

Show code
# Load GBSG data (included in forestsearch package)
df.analysis <- gbsg

# Prepare analysis variables
df.analysis <- within(df.analysis, {
  id <- seq_len(nrow(df.analysis))
  time_months <- rfstime / 30.4375
  grade3 <- ifelse(grade == "3", 1, 0)
  treat <- hormon
})

# Define variable roles
confounders.name <- c("age", "meno", "size", "grade3", "nodes", "pgr", "er")
outcome.name <- "time_months"
event.name <- "status"
id.name <- "id"
treat.name <- "hormon"

# Display data structure
cat("Sample size:", nrow(df.analysis), "\n")
Sample size: 686 
Show code
cat("Events:", sum(df.analysis[[event.name]]), 
    sprintf("(%.1f%%)\n", 100 * mean(df.analysis[[event.name]])))
Events: 299 (43.6%)
Show code
cat("Baseline factors:", paste(confounders.name, collapse = ", "), "\n")
Baseline factors: age, meno, size, grade3, nodes, pgr, er 

3.3 Baseline Characteristics

Show code
create_summary_table(
  data = df.analysis,
  treat_var = treat.name,
  table_title = "GBSG Baseline Characteristics by Treatment Arm",
  vars_continuous = c("age", "nodes", "size", "er", "pgr"),
  vars_categorical = c("grade", "meno"),
  font_size = 12
)
GBSG Baseline Characteristics by Treatment Arm
Characteristic Control (n=440) Treatment (n=246) P-value1 SMD2
age Mean (SD) 51.1 (10.0) 56.6 (9.4) <0.001 0.57
nodes Mean (SD) 4.9 (5.6) 5.1 (5.3) 0.665 0.03
size Mean (SD) 29.6 (14.4) 28.8 (14.1) 0.470 0.06
er Mean (SD) 79.7 (124.2) 125.8 (191.1) <0.001 0.30
pgr Mean (SD) 102.0 (170.0) 124.3 (249.7) 0.213 0.11
grade 0.273 0.06
1 48 (10.9%) 33 (13.4%)
2 281 (63.9%) 163 (66.3%)
3 111 (25.2%) 50 (20.3%)
meno <0.001 0.27
0 231 (52.5%) 59 (24.0%)
1 209 (47.5%) 187 (76.0%)
1 P-values: t-test for continuous, chi-square/Fisher's exact for categorical/binary variables
2 SMD = Standardized mean difference (Cohen's d for continuous, Cramer's V for categorical)

3.4 Kaplan-Meier Analysis (ITT Population)

Show code
# Prepare counting process data for KM plot
dfcount <- df_counting(
  df = df.analysis,
  by.risk = 6,
  tte.name = outcome.name,
  event.name = event.name,
  treat.name = treat.name
)

# Plot with confidence intervals and log-rank test
plot_weighted_km(
  dfcount,
  conf.int = TRUE,
  show.logrank = TRUE,
  ymax = 1.05,
  xmed.fraction = 0.775,
  ymed.offset = 0.125
)

The ITT Cox hazard ratio estimate is approximately 0.69 (95% CI: 0.54, 0.89), suggesting an overall benefit for hormonal therapy.

4 Preliminary Analysis: Generalized Random Forests

Before running ForestSearch, we can use GRF to explore potential treatment effect heterogeneity and identify candidate factors.

Show code
t0 <- proc.time()

grf_est <- grf.subg.harm.survival(
  data = df.analysis,
  confounders.name = confounders.name,
  outcome.name = outcome.name,
  event.name = event.name,
  id.name = id.name,
  treat.name = treat.name,
  maxdepth = 2,
  n.min = 60,
  dmin.grf = 12,
  frac.tau = 0.6,
  details = TRUE,
  return_selected_cuts_only = FALSE
)
tau, maxdepth = 46.75811 2 
   leaf.node control.mean control.size control.se depth
1          2         6.49        82.00       3.34     1
2          3        -4.10       604.00       1.06     1
11         4        -7.90       112.00       2.81     2
21         5         3.86       177.00       1.87     2
4          7        -5.89       356.00       1.33     2

Selected subgroup:
  leaf.node control.mean control.size control.se depth
1         2         6.49        82.00       3.34     1

GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "er <= 0"

All splits (from all trees):
[1] "er <= 0"   "age <= 50" "age <= 43"
Show code
timings$grf <- (proc.time() - t0)["elapsed"]
Show code
# Display policy trees
# leaf1 = recommend control, leaf2 = recommend treatment
par(mfrow = c(1, 2))
plot(grf_est$tree1, leaf.labels = c("Control", "Treat"), main = "Depth 1")
Show code
plot(grf_est$tree2, leaf.labels = c("Control", "Treat"), main = "Depth 2")
Show code
par(mfrow = c(1, 1))

GRF identifies estrogen receptor status (ER) as a key factor, with ER ≤ 0 suggesting potential harm from hormonal therapy.

5 ForestSearch Analysis

5.1 Parallel Processing Configuration

ForestSearch supports parallel processing for computationally intensive operations (bootstrap, cross-validation).

Show code
# Detect available cores (use fewer for CRAN checks)
n_cores <- min(parallel::detectCores() - 1, 20)
cat("Using", n_cores, "cores for parallel processing\n")
Using 13 cores for parallel processing
Show code
# Configure parallel backend
registerDoFuture()
registerDoRNG()

5.2 Running ForestSearch

ForestSearch performs an exhaustive search over candidate subgroup combinations with up to maxk factors. Key parameters:

Parameter Value Description
hr.threshold 1.25 Minimum HR for consistency evaluation
hr.consistency 1.0 Minimum consistency rate for candidates
pconsistency.threshold 0.90 Required consistency for selection
maxk 2 Maximum factors in subgroup definition
n.min 60 Minimum subgroup sample size
d0.min, d1.min 12 Minimum events per treatment arm
Show code
t0 <- proc.time()

fs <- forestsearch(
  df.analysis,
  confounders.name = confounders.name,
  outcome.name = outcome.name,
  treat.name = treat.name,
  event.name = event.name,
  id.name = id.name,
  # Threshold parameters (per León et al. 2024)
  hr.threshold = 1.25,
  hr.consistency = 1.0,
  pconsistency.threshold = 0.80,
  stop_threshold = 0.80,
  # Search configuration
  sg_focus = "hr",
  max_subgroups_search = 5,
  use_twostage = TRUE,
  # Factor selection
  use_grf = TRUE, 
  return_selected_cuts_only = TRUE,
  use_lasso = TRUE,
  cut_type = "default",
  # Subgroup constraints
  maxk = 2,
  n.min = 60,
  d0.min = 10,
  d1.min = 10,
  # Consistency evaluation
  fs.splits = 100,
  # Parallel processing
  parallel_args = list(
    plan = "multisession",
    workers = n_cores,
    show_message = TRUE
  ),
  # Output options
  showten_subgroups = TRUE,
  details = TRUE,
  plot.sg = TRUE
)

=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30 
Maximum total splits: 100 
Batch size: 20 
================================================

GRF stage for cut selection with dmin, tau = 12 0.6 
  return_selected_cuts_only = TRUE: using cuts from selected tree only
tau, maxdepth = 46.75811 2 
   leaf.node control.mean control.size control.se depth
1          2         6.49        82.00       3.34     1
2          3        -4.10       604.00       1.06     1
11         4        -7.90       112.00       2.81     2
21         5         3.86       177.00       1.87     2
4          7        -5.89       356.00       1.33     2

Selected subgroup:
  leaf.node control.mean control.size control.se depth
1         2         6.49        82.00       3.34     1

GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "er <= 0"

Cuts from selected tree (depth = 1 ):
[1] "er <= 0"
GRF cuts identified: 1 
  Cuts: er <= 0 
  Selected tree depth: 1 
# of continuous/categorical characteristics 5 2 
Continuous characteristics: age size nodes pgr er 
Categorical characteristics: meno grade3 
## Prior to lasso: age size nodes pgr er 
#### Lasso selection results 
7 x 1 sparse Matrix of class "dgCMatrix"
                 s0
age     .          
meno    .          
size    0.005433435
grade3  0.178139021
nodes   0.049670523
pgr    -0.001812895
er      .          
Cox-LASSO selected: size grade3 nodes pgr 
Cox-LASSO not selected: age meno er 
### End Lasso selection 
## After lasso: size nodes pgr 
Default cuts included from Lasso: size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size) nodes <= mean(nodes) nodes <= median(nodes) nodes <= qlow(nodes) nodes <= qhigh(nodes) pgr <= mean(pgr) pgr <= median(pgr) pgr <= qlow(pgr) pgr <= qhigh(pgr) 
Categorical after Lasso: grade3 
Factors per GRF: er <= 0 
Initial GRF cuts included er <= 0 
Factors included per GRF (not in lasso) er <= 0 

===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 14 cut expressions once and caching...
Cut evaluation summary:
  Total cuts:  14 
  Valid cuts:  14 
  Errors:  0 
✓ All 14 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====

# of candidate subgroup factors= 14 
 [1] "er <= 0"      "size <= 29.3" "size <= 25"   "size <= 20"   "size <= 35"  
 [6] "nodes <= 5"   "nodes <= 3"   "nodes <= 1"   "nodes <= 7"   "pgr <= 110"  
[11] "pgr <= 32.5"  "pgr <= 7"     "pgr <= 131.8" "grade3"      
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 406 
Events criteria: control >= 10 , treatment >= 10 
Sample size criteria: n >= 60 
Subgroup search completed in 0.01 minutes

--- Filtering Summary ---
  Combinations evaluated: 406 
  Passed variance check: 374 
  Passed prevalence (>= 0.025 ): 374 
  Passed redundancy check: 354 
  Passed event counts (d0>= 10 , d1>= 10 ): 270 
  Passed sample size (n>= 60 ): 263 
  Cox model fit successfully: 263 
  Passed HR threshold (>= 1.25 ): 8 
-------------------------

Found 8 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 8 
Random seed set to: 8316951 
Two-stage parameters:
  n.splits.screen: 30 
  screen.threshold: 0.617 
  batch.size: 20 
  conf.level: 0.95 
# of unique initial candidates: 8 
# Restricting to top stop_Kgroups = 5 
# of candidates to evaluate: 5 
# Early stop threshold: 0.8 

================================================================================ 
TOP 5 CANDIDATE SUBGROUPS FOR CONSISTENCY EVALUATION
Sorted by: hr 
================================================================================ 

Rank   HR        N       Events  K    Subgroup Definition
-------------------------------------------------------------------------------- 
1      2.537     61      34      2    {er <= 0} & {size <= 35}
2      2.335     61      31      2    {er <= 0} & {nodes <= 7}
3      2.222     75      41      2    {er <= 0} & {pgr <= 32.5}
4      2.054     61      35      2    {er <= 0} & !{size <= 20}
5      1.992     64      34      2    {er <= 0} & {pgr <= 7}
-------------------------------------------------------------------------------- 
Parallel config: workers = 13 , batch_size = 1 
Batch 1 / 5 : candidates 1 - 1 

==================================================
EARLY STOP TRIGGERED (batch 1 )
  Candidate: 1 of 5 
  Pcons: 0.97 >= 0.8 
==================================================

Evaluated 1 of 5 candidates (early stop) 
1 subgroups passed consistency threshold

*** Subgroup found: {er <= 0} {size <= 35} 
% consistency criteria met= 0.97 
SG focus = hr 
Seconds and minutes forestsearch overall = 2.425 0.0404 
Consistency algorithm used: twostage 
Show code
plan("sequential")
timings$forestsearch <- (proc.time() - t0)["elapsed"]

cat("\nForestSearch completed in", 
    round(timings$forestsearch, 1), "seconds\n")

ForestSearch completed in 2.4 seconds

5.3 ForestSearch Results

5.3.1 Identified Subgroups

Show code
# Generate results tables
res_tabs <- sg_tables(fs, ndecimals = 3, which_df = "est")

# Display top subgroups meeting criteria
res_tabs$sg10_out
Identified Subgroups
Two-factor subgroups (maxk=2)
Factor 1 Factor 2 N Events E1 HR Pcons
{er <= 0} {size <= 35} 61 34 15 2.537 0.970
Search Configuration: Single-factor candidates (L) = 28; Maximum combinations evaluated = 406; Search depth (maxk) = 2
Search Results: Candidate subgroups found = 8; Maximum HR estimate = 2.54
Note: E1 = events in treatment arm; Pcons = consistency proportion

5.3.2 Treatment Effect Estimates

Show code
# ITT and subgroup estimates
res_tabs$tab_estimates
Treatment Effect Estimates
Training data estimates
Subgroup n n1 events m1 m0 RMST HR (95% CI)
ITT 686 (100.0%) 246 (35.9%) 299 (43.6%) 66.3 50.2 7.8 0.69 (0.54, 0.89)
Questionable1 61 (8.9%) 23 (37.7%) 34 (55.7%) 18.5 48 -19 2.54 (1.25, 5.17)
Recommend 625 (91.1%) 223 (35.7%) 265 (42.4%) 66.7 52.2 9.6 0.61 (0.47, 0.79)
1 Identified subgroup : {er <= 0} & {size <= 35}

5.3.3 Identified Subgroup Definition

Show code
cat("Identified subgroup (H):", paste(fs$sg.harm, collapse = " & "), "\n")
Identified subgroup (H): {er <= 0} & {size <= 35} 
Show code
cat("Subgroup size:", sum(fs$df.est$treat.recommend == 0), 
    sprintf("(%.1f%% of ITT)\n", 
            100 * mean(fs$df.est$treat.recommend == 0)))
Subgroup size: 61 (8.9% of ITT)

ForestSearch identifies Estrogen ≤ 0 (ER-negative) as the subgroup with potential harm. This is biologically plausible: tamoxifen is a selective estrogen receptor modulator with limited efficacy in ER-negative tumors.

6 Bootstrap Bias Correction

6.1 Rationale

Cox model estimates from identified subgroups are upwardly biased due to the selection process (subgroups are selected because they show extreme effects). Bootstrap bias correction addresses this by:

  1. Resampling with replacement
  2. Re-running the entire ForestSearch algorithm
  3. Computing bias terms from bootstrap vs. observed estimates
  4. Applying infinitesimal jackknife variance estimation

6.2 Running Bootstrap Analysis

Show code
# Number of bootstrap iterations
# Use 500-2000 for production; reduced here for vignette
NB <- 2

t0 <- proc.time()

fs_bc <- forestsearch_bootstrap_dofuture(
  fs.est = fs,
  nb_boots = NB,
  show_three = FALSE,
  details = FALSE
)

plan("sequential")
timings$bootstrap <- (proc.time() - t0)["elapsed"]

cat("\nBootstrap completed in", 
    round(timings$bootstrap / 60, 1), "minutes\n")

Bootstrap completed in 0.1 minutes

6.3 Bootstrap Summary and Diagnostics

Show code
# Comprehensive summary with diagnostics
summaries <- summarize_bootstrap_results(
  sgharm = fs$sg.harm,
  boot_results = fs_bc,
  create_plots = TRUE,
  est.scale = "hr"
)

===============================================================
           BOOTSTRAP ANALYSIS SUMMARY                          
===============================================================

IDENTIFIED SUBGROUP:
-------------------------------------------------------------
  H: {er <= 0} & {size <= 35}

BOOTSTRAP SUCCESS METRICS:
-------------------------------------------------------------
  Total iterations:              2
  Successful subgroup ID:        2 (100.0%)
  Failed to find subgroup:       0 (0.0%)

TIMING ANALYSIS:
-------------------------------------------------------------
Overall:
  Total bootstrap time:          0.05 minutes (0.00 hours)
  Average per iteration:         0.02 min (1.4 sec)
  Projected for 1000 boots:      23.19 min (0.39 hrs)
Show code
# Display bias-corrected estimates table
summaries$table
Treatment Effect by Subgroup
Bootstrap bias-corrected estimates (2 iterations)
Subgroup
Sample Size
Survival
Treatment Effect
N NT Events MedT MedC RMSTd HR
(95% CI)1
HR
(95% CI)2
Qstnbl3 61 (8.9%) 23 (37.7%) 34 (55.7%) 18.5 48 -19 2.54 (1.25, 5.17) 2.13 (0.15,29.84)
Recmnd 625 (91.1%) 223 (35.7%) 265 (42.4%) 66.7 52.2 9.6 0.61 (0.47, 0.79) 0.57 (0.08,3.95)
1 Unadjusted HR: Standard Cox regression hazard ratio with robust standard errors
2 Bias-corrected HR: Bootstrap-adjusted estimate using infinitesimal jackknife method (2 iterations). Corrects for optimism in subgroup selection.
3 Identified subgroup: {er <= 0} & {size <= 35}
Note: Med = Median survival time (months). RMSTd = Restricted mean survival time difference. Subgroup identified in 100.0% of bootstrap samples.

6.4 Kaplan-Meier by Identified Subgroups

Show code
 km_result <- plot_sg_weighted_km(
   fs.est = fs,
   outcome.name = "time_months",
   event.name = "status",
   treat.name = "hormon",
   show.logrank = FALSE,
   conf.int = TRUE,
   by.risk = 12,
   show.cox = FALSE, show.cox.bc = TRUE,
   fs_bc = fs_bc,
   hr_bc_position = "topright"
 )

Kaplan-Meier survival curves by identified subgroup

Note: Identified subgroup: {er <= 0} & {size <= 35}. HR(bc) = bootstrap bias-corrected hazard ratio. Medians [95% CI] for arms are un-adjusted.

6.4.1 Event Count Summary

Low event counts can lead to unstable HR estimates. This summary helps identify potential issues:

Show code
# note that default required minimum events is 12 for subgroup candidate
# Here we evaluate frequency of subgroup candidates in bootstrap samples less than 15
event_summary <- summarize_bootstrap_events(fs_bc, threshold = 15)

=== Bootstrap Event Count Summary ===
Total bootstrap iterations: 2
Event threshold: <15 events

ORIGINAL Subgroup H on BOOTSTRAP samples:
  Control arm <15 events: 0 (0.0%)
  Treatment arm <15 events: 0 (0.0%)
  Either arm <15 events: 0 (0.0%)

ORIGINAL Subgroup Hc on BOOTSTRAP samples:
  Control arm <15 events: 0 (0.0%)
  Treatment arm <15 events: 0 (0.0%)
  Either arm <15 events: 0 (0.0%)

NEW Subgroups found: 2 (100.0%)

NEW Subgroup H* on ORIGINAL data:
  Control arm <15 events: 1 (50.0% of successful)
  Treatment arm <15 events: 1 (50.0% of successful)
  Either arm <15 events: 1 (50.0% of successful)

NEW Subgroup Hc* on ORIGINAL data:
  Control arm <15 events: 0 (0.0% of successful)
  Treatment arm <15 events: 0 (0.0% of successful)
  Either arm <15 events: 0 (0.0% of successful)

6.4.2 Bootstrap Diagnostics

Show code
# Quality metrics
summaries$diagnostics_table_gt
Bootstrap Diagnostics Summary
Analysis of 2 bootstrap iterations
Category Metric Value
Success Rate Total iterations 2
Successful 2 (100.0%)
Failed 0 (0.0%)
Success rating Excellent
Subgroup H (Questionable) Observed HR 2.537
Bias-corrected HR 2.131
Bootstrap CV (%) 60.2%
N estimates 2
Subgroup Hc (Recommend) Observed HR 0.608
Bias-corrected HR 0.570
Bootstrap CV (%) 59.4%
N estimates 2

6.4.3 Subgroup Agreement

How consistently does bootstrap identify the same subgroup?

Show code
# Agreement with original analysis
if (!is.null(summaries$subgroup_summary$original_agreement)) {
  summaries$subgroup_summary$original_agreement
}
                            Metric                    Value
                            <char>                   <char>
1:      Total bootstrap iterations                        2
2:           Successful iterations                        2
3: Failed iterations (no subgroup)                        0
4:                                                         
5:    Original subgroup definition {er <= 0} & {size <= 35}
6:       Exact match with original                 0 (0.0%)
7:         Different from original               2 (100.0%)
8:   Partial match (shared factor)                 0 (0.0%)
Show code
# Factor presence across bootstrap iterations
if (!is.null(summaries$subgroup_summary$factor_presence)) {
  summaries$subgroup_summary$factor_presence
}
  Rank Factor Count Percent
1    1    age     1      50
2    2 grade3     1      50
3    3    pgr     1      50
4    4   size     1      50

6.4.4 Bootstrap Distributions

Show code
if (!is.null(summaries$plots)) {
  summaries$plots$H_distribution + summaries$plots$Hc_distribution
}

7 Cross-Validation

Cross-validation assesses the stability of the ForestSearch algorithm. Two approaches are available:

7.1 K-Fold Cross-Validation

Show code
# 10-fold CV with multiple iterations
# Use Ksims >= 50 for production
Ksims <- 1

t0 <- proc.time()

fs_kfold <- forestsearch_tenfold(
  fs.est = fs,
  sims = Ksims,
  Kfolds = 2,
  details = FALSE,
  parallel_args = list(
    plan = "multisession",
    workers = n_cores,
    show_message = FALSE
  )
)

plan("sequential")
timings$kfold <- (proc.time() - t0)["elapsed"]


metrics_tables <- cv_metrics_tables(fs_kfold)


metrics_tables
Cross-Validation Metrics
Subgroup: Identified Subgroup
Metric Description Value (%)
Agreement
Sensitivity (H) Agreement rate for subgroup H 11.5
Sensitivity (Hc) Agreement rate for complement Hc 89.8
PPV (H) Positive predictive value for H 9.9
PPV (Hc) Positive predictive value for Hc 91.2
Subgroup Finding
Any Found Any subgroup identified 50.0
Exact Match Exact match on all factors 0.0
At Least 1 At least one factor matches 50.0
Cov1 Any First covariate found (any cut) 0.0
Cov2 Any Second covariate found (any cut) 50.0
Cov1 & Cov2 Both covariates found 0.0
Cov1 Exact First covariate exact match 0.0
Cov2 Exact Second covariate exact match 50.0
Based on 1 simulation(s) with 2-fold CV. Values are proportions shown as percentages.
Show code
########################
# SECTION on De-bugging
########################

run_debug <- FALSE


if(run_debug){

fs.est <- fs
# Capture what one fold actually produces
fs_args <- fs.est$args_call_all
fs_args$parallel_args <- list(plan = "sequential", workers = 1, show_message = FALSE)
fs_args$details <- TRUE
fs_args$plot.sg <- FALSE

# Replicate the data prep from Section 3
confounders.name <- fs_args$confounders.name
outcome.name <- fs_args$outcome.name
event.name <- fs_args$event.name
treat.name <- fs_args$treat.name
id.name <- fs_args$id.name
get_names <- c(confounders.name, outcome.name, event.name, id.name, treat.name)

dfa <- fs.est$df.est[, c(get_names, "treat.recommend")]
names(dfa)[names(dfa) == "treat.recommend"] <- "treat.recommend.original"
dfnew <- as.data.frame(dfa)

# Replicate one fold
set.seed(8316951 + 1000)
id_sample <- sample(seq_len(nrow(dfnew)), replace = FALSE)
df_scrambled <- dfnew[id_sample, ]
folds <- cut(seq_len(nrow(df_scrambled)), breaks = 5, labels = FALSE)

x.train <- df_scrambled[folds != 1, ]
fs_args$df.analysis <- x.train

result <- try(do.call(forestsearch, fs_args), silent = FALSE)
print(result)


# Continue from where we left off — simulate a full 5-fold run
resCV_list <- vector("list", 5)

for (cv_index in 1:5) {
  x.test <- df_scrambled[folds == cv_index, ]
  x.train <- df_scrambled[folds != cv_index, ]
  
  fs_args$df.analysis <- x.train
  fs.train <- suppressWarnings(try(do.call(forestsearch, fs_args), TRUE))
  
  if (!inherits(fs.train, "try-error") && !is.null(fs.train$sg.harm)) {
    cat("Fold", cv_index, ": subgroup FOUND\n")
  } else {
    cat("Fold", cv_index, ": no subgroup\n")
    x.test$treat.recommend <- 1.0
    x.test$sg1 <- NA
    x.test$sg2 <- NA
    x.test$cvindex <- cv_index
  }
  
  resCV_list[[cv_index]] <- x.test
}

# Now test the part that actually fails:
resCV <- do.call(rbind, resCV_list)
res <- list(
  resCV = as.data.frame(resCV),
  Kfolds = 5,
  cv_args = fs_args,
  sg_analysis = fs.est$sg.harm,
  sg0.name = "Not recommend",
  sg1.name = "Recommend"
)

out <- try(forestsearch_KfoldOut(res = res, outall = FALSE, details = TRUE), silent = FALSE)
print(out)



# In your console, run ONE simulation manually without error swallowing:
fs_args <- fs.est$args_call_all
fs_args$parallel_args <- list(plan = "sequential", workers = 1, show_message = FALSE)
fs_args$details <- FALSE
fs_args$plot.sg <- FALSE

confounders.name <- fs_args$confounders.name
outcome.name <- fs_args$outcome.name
event.name <- fs_args$event.name
treat.name <- fs_args$treat.name
id.name <- fs_args$id.name
get_names <- c(confounders.name, outcome.name, event.name, id.name, treat.name)

dfa <- fs.est$df.est[, c(get_names, "treat.recommend")]
names(dfa)[names(dfa) == "treat.recommend"] <- "treat.recommend.original"
dfnew <- as.data.frame(dfa)

Kfolds <- 5

# Simulate one iteration
df_scrambled <- data.table::copy(dfnew)
set.seed(8316951 + 1000)
id_sample <- sample(seq_len(nrow(df_scrambled)), replace = FALSE)
df_scrambled <- df_scrambled[id_sample, ]
folds <- cut(seq_len(nrow(df_scrambled)), breaks = Kfolds, labels = FALSE)

resCV_list <- vector("list", Kfolds)

for (cv_index in seq_len(Kfolds)) {
  testIndexes <- which(folds == cv_index)
  x.test <- df_scrambled[testIndexes, ]
  x.train <- df_scrambled[-testIndexes, ]

  fs_args$df.analysis <- x.train
  fs.train <- suppressWarnings(try(do.call(forestsearch, fs_args), TRUE))

  if (!inherits(fs.train, "try-error") && !is.null(fs.train$sg.harm)) {
    
    print(fs.train$sg.harm)
    
   cat("Fold", cv_index, ": sg.harm =", fs.train$sg.harm, "\n")
    cat("Fold", cv_index, ": x.test cols =", head(names(x.test), 20), "\n")
    
    sg1 <- fs.train$sg.harm[1]
    sg2 <- if (length(fs.train$sg.harm) > 1) fs.train$sg.harm[2] else NA
    df.test <- get_dfpred(df.predict = x.test, sg.harm = fs.train$sg.harm, version = 2)
    df.test$cvindex <- cv_index
    df.test$sg1 <- sg1
    df.test$sg2 <- sg2
  } else {
    df.test <- x.test
    df.test$cvindex <- cv_index
    df.test$sg1 <- NA
    df.test$sg2 <- NA
    df.test$treat.recommend <- 1.0
  }

  # This is the line from the actual function — does it error?
  resCV_list[[cv_index]] <- df.test[, c(id.name, outcome.name, event.name, treat.name,
                                        "treat.recommend", "treat.recommend.original",
                                        "cvindex", "sg1", "sg2")]

  cat("Fold", cv_index, ": OK -", ncol(resCV_list[[cv_index]]), "cols,",
      nrow(resCV_list[[cv_index]]), "rows\n")
}

resCV <- do.call(rbind, resCV_list)
cat("rbind OK:", nrow(resCV), "rows\n")
}

7.2 Out-of-Bag (N-Fold) Cross-Validation

N-fold CV (leave-one-out):

Show code
t0 <- proc.time()

fs_OOB <- forestsearch_Kfold(
  fs.est = fs,
  details = FALSE,
  Kfolds = round(nrow(df.analysis)/100,0),  # N-fold = leave-one-out
  parallel_args = list(
    plan = "multisession",
    workers = n_cores,
    show_message = TRUE
  )
)

plan("sequential")
timings$oob <- (proc.time() - t0)["elapsed"]

# Summarize OOB results


cv_out <- forestsearch_KfoldOut(
  res = fs_OOB,
  details = FALSE,
  outall = TRUE
)

tables <- cv_summary_tables(cv_out)

tables$combined_table

tables$metrics_table

8 Results Visualization

8.1 Forest Plot

The forest plot summarizes treatment effects across the ITT population, reference subgroups, and identified subgroups with cross-validation metrics.

Show code
# Define reference subgroups for comparison
subgroups <- list(
  age_gt65 = list(
    subset_expr = "age > 65",
    name = "Age > 65",
    type = "reference"
  ),
  age_le65 = list(
    subset_expr = "age <= 65",
    name = "Age ≤ 65",
    type = "reference"
  ),
  pgr_positive = list(
    subset_expr = "pgr > 0",
    name = "PgR > 0",
    type = "reference"
  ),
  pgr_negative = list(
    subset_expr = "pgr <= 0",
    name = "PgR ≤ 0",
    type = "reference"
  )
)


my_theme <- create_forest_theme(base_size = 24, 
footnote_fontsize = 17, cv_fontsize = 22)


# Create forest plot
# Include fs_kfold and fs_OOB if available for CV metrics
result <- plot_subgroup_results_forestplot(
  fs_results = list(
    fs.est = fs,
    fs_bc = fs_bc,
    fs_OOB = NULL,
    fs_kfold = fs_kfold
  ),
  df_analysis = df.analysis,
  subgroup_list = subgroups,
  outcome.name = outcome.name,
  event.name = event.name,
  treat.name = treat.name,
  E.name = "Hormonal",
  C.name = "Chemo",
  ci_column_spaces = 25,
  xlog = TRUE,
  theme = my_theme
)


# Option 2: Custom sizing
render_forestplot(result)   

Subgroup forest plot including identified subgroups

8.1.1 KM Difference plots: ITT and subgroups

The solid black line denotes the ITT Kaplan-Meier treatment difference estimates along with 95%95% CIs (the grey shaded region). K-M differences corresponding to subgroups are displayed.

Show code
# Add additional subgroups along with ITT and identified subgroups
ref_sgs <- list(
age_young = list(subset_expr = "age < 65", color = "brown"),
age_old = list(subset_expr = "age >= 65", color = "orange")
)

plot_km_band_forestsearch(
 df = df.analysis,
   fs.est = fs,
 ref_subgroups = ref_sgs,
 outcome.name = outcome.name,
   event.name = event.name,
   treat.name = treat.name,
 draws_band = 20
)

Show code
# # Example with more subgroups
# ref_sgs <- list(
# pgr_positive = list(subset_expr = "pgr > 0", color ="green"),
# pgr_negative = list(subset_expr = "pgr <= 0", color = "purple"),
# age_young = list(subset_expr = "age < 65", color = "brown"),
# age_old = list(subset_expr = "age >= 65", color = "orange")
# )
# plot_km_band_forestsearch(
#  df = df.analysis,
#    fs.est = fs,
#  ref_subgroups = ref_sgs,
#  outcome.name = outcome.name,
#    event.name = event.name,
#    treat.name = treat.name
# )

9 Summary and Interpretation

9.1 Key Findings

Show code
# Extract key results
cat("=" %>% rep(60) %>% paste(collapse = ""), "\n")
============================================================ 
Show code
cat("FORESTSEARCH ANALYSIS SUMMARY\n")
FORESTSEARCH ANALYSIS SUMMARY
Show code
cat("=" %>% rep(60) %>% paste(collapse = ""), "\n\n")
============================================================ 
Show code
cat("Dataset: GBSG (N =", nrow(df.analysis), ")\n")
Dataset: GBSG (N = 686 )
Show code
cat("Outcome: Recurrence-free survival\n\n")
Outcome: Recurrence-free survival
Show code
cat("ITT Analysis:\n")
ITT Analysis:
Show code
cat("  HR (95% CI): 0.69 (0.54, 0.89)\n\n")
  HR (95% CI): 0.69 (0.54, 0.89)
Show code
cat("Identified Subgroup (H):\n")
Identified Subgroup (H):
Show code
cat("  Definition:", paste(fs$sg.harm, collapse = " & "), "\n")
  Definition: {er <= 0} & {size <= 35} 
Show code
cat("  Size:", sum(fs$df.est$treat.recommend == 0), 
    sprintf("(%.1f%%)\n", 100 * mean(fs$df.est$treat.recommend == 0)))
  Size: 61 (8.9%)
Show code
cat("  Unadjusted HR:", sprintf("%.2f", exp(fs$grp.consistency$out_sg$result$hr[1])), "\n")
  Unadjusted HR: 12.64 
Show code
cat("\nComplement Subgroup (Hc):\n")

Complement Subgroup (Hc):
Show code
cat("  Size:", sum(fs$df.est$treat.recommend == 1),
    sprintf("(%.1f%%)\n", 100 * mean(fs$df.est$treat.recommend == 1)))
  Size: 625 (91.1%)

9.2 Clinical Interpretation

The ForestSearch analysis identifies estrogen receptor-negative (ER ≤ 0) patients as a subgroup with potential lack of benefit from hormonal therapy.

Biological plausibility: Tamoxifen is a selective estrogen receptor modulator. Its efficacy depends on ER expression. The finding that ER-negative patients may not benefit is consistent with:

  • Mechanistic understanding of tamoxifen action
  • Meta-analyses showing no tamoxifen benefit in ER-negative breast cancer
  • Clinical guidelines recommending tamoxifen primarily for ER-positive tumors

Caveats:

  1. This is an exploratory analysis requiring independent validation
  2. The bias-corrected estimates have wider confidence intervals
  3. Cross-validation metrics should be evaluated for algorithm stability

9.3 Computational Timing

Show code
timings$total <- (proc.time() - t_vignette_start)["elapsed"]

timing_df <- data.frame(
  Analysis = c("GRF", "ForestSearch", "Bootstrap", "Kfold","Total"),
  Seconds = c(
    timings$grf,
    timings$forestsearch,
    timings$bootstrap,
    timings$kfold,
    timings$total
  )
)
timing_df$Minutes <- timing_df$Seconds / 60

gt(timing_df) |>
  tab_header(title = "Computational Timing") |>
  fmt_number(columns = c(Seconds, Minutes), decimals = 1) |>
  cols_label(
    Analysis = "Component",
    Seconds = "Time (sec)",
    Minutes = "Time (min)"
  )
Computational Timing
Component Time (sec) Time (min)
GRF 0.2 0.0
ForestSearch 2.4 0.0
Bootstrap 4.0 0.1
Kfold 2.4 0.0
Total 10.9 0.2

10 References

León LF, Jemielita T, Guo Z, Marceau West R, Anderson KM (2024). “Exploratory subgroup identification in the heterogeneous Cox model: A relatively simple procedure.” Statistics in Medicine. DOI: 10.1002/sim.10163

11 Session Information

Show code
sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] weightedsurv_0.1.0      patchwork_1.3.2         doRNG_1.8.6.2          
 [4] rngtools_1.5.2          doFuture_1.2.0          future_1.69.0          
 [7] foreach_1.5.2           policytree_1.2.3        grf_2.5.0              
[10] gt_1.3.0                ggplot2_4.0.1           data.table_1.18.0      
[13] survival_3.8-6          forestsearch_0.0.0.9000

loaded via a namespace (and not attached):
 [1] gtable_0.3.6         shape_1.4.6.1        xfun_0.53           
 [4] htmlwidgets_1.6.4    visNetwork_2.1.4     lattice_0.22-7      
 [7] vctrs_0.6.5          tools_4.5.1          generics_0.1.4      
[10] parallel_4.5.1       tibble_3.3.0         pkgconfig_2.0.3     
[13] Matrix_1.7-3         forestploter_1.1.3   RColorBrewer_1.1-3  
[16] S7_0.2.0             lifecycle_1.0.4      compiler_4.5.1      
[19] farver_2.1.2         stringr_1.6.0        codetools_0.2-20    
[22] litedown_0.7         htmltools_0.5.8.1    sass_0.4.10         
[25] yaml_2.3.10          glmnet_4.1-10        pillar_1.11.1       
[28] iterators_1.0.14     parallelly_1.46.1    commonmark_2.0.0    
[31] tidyselect_1.2.1     digest_0.6.37        stringi_1.8.7       
[34] dplyr_1.1.4          listenv_0.9.1        labeling_0.4.3      
[37] splines_4.5.1        fastmap_1.2.0        grid_4.5.1          
[40] cli_3.6.5            magrittr_2.0.4       DiagrammeR_1.0.11   
[43] base64enc_0.1-3      randomForest_4.7-1.2 future.apply_1.20.1 
[46] withr_3.0.2          scales_1.4.0         rmarkdown_2.30      
[49] globals_0.18.0       gridExtra_2.3        progressr_0.18.0    
[52] evaluate_1.0.5       knitr_1.51           markdown_2.0        
[55] rlang_1.1.6          Rcpp_1.1.0           glue_1.8.0          
[58] xml2_1.4.0           rstudioapi_0.17.1    jsonlite_2.0.0      
[61] R6_2.6.1             fs_1.6.6